Updating existing QSAR models: selection and weighting of new data

نویسندگان

  • Tomas Öberg
  • Tao Liu
چکیده

Computational chemistry and quantitative structureactivity relationships (QSAR) are foreseen to be extensively used in the implementation of the new REACH regulation for chemicals in Europe. However, for some compound groups the data are too few in number to permit both calibration and testing of a new model. Usage and previously developed or updated models are then viable alternatives. Perfluorocarboxylic acids (PFCAs) and fluoroteleomer alcohols (FTOHs) are two groups of environmentally relevant compounds, with unique physical and chemical properties. The subcooled liquid vapour pressure (pL) is one such property, where experimental determinations are limited and far from consistent [1]. Updating is, however, challenging when the new compounds are far outside of the original calibration domain space. But by carefully selecting and weighting only three new compounds, we have been able to update a previously developed general QSAR model [2], to cover the new domain while maintaining predictive performance for the earlier calibration and test data. The optimal weighting scheme was determined from the sample leverages and residuals in the calibration phase [3]. The performance of this re-calibrated model greatly surpassed previous modelling attempts [4], when applied to an external test set of two PFCAs and four FTOHs with pL in the range 0.2-200 Pa; with Q2Ext = 0.994 and RMSEP = 0.190 units of log Pa. The domain coverage also increased from 1% to 51%, for 426 perfluoroalkylated compounds selected from the REACH registration list, the PhysProp database, and the OECD 2006 survey [5]. Selection and weighting of new calibration data can thus facilitate the extension and use of existing QSAR models. This investigation was supported by the EU FP7 project CADASTER (grant agreement no. 212668).

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Pixel selection by successive projections algorithm method in multivariate image analysis for a QSAR study of antimicrobial activity for cephalosporins and design new cephalosporins

Thirty-one Cephalosporin compounds were modeled using the multivariate image analysis and applied to the quantitative structure activity relationship (MIA-QSAR) approach. The acid dissociation constants (pKa) of cephalosporins play a fundamental role in the mechanism of activity of cephalosporins. The antimicrobial activity of cephalosporins was related to their first pKa by different models. B...

متن کامل

Pixel selection by successive projections algorithm method in multivariate image analysis for a QSAR study of antimicrobial activity for cephalosporins and design new cephalosporins

Thirty-one Cephalosporin compounds were modeled using the multivariate image analysis and applied to the quantitative structure activity relationship (MIA-QSAR) approach. The acid dissociation constants (pKa) of cephalosporins play a fundamental role in the mechanism of activity of cephalosporins. The antimicrobial activity of cephalosporins was related to their first pKa by different models. B...

متن کامل

FEM Updating for Offshore Jacket Structures Using Measured Incomplete Modal Data

Marine industry requires continued development of new technologies in order to produce oil. An essential requirement in design is to be able to compare experimental data from prototype structures with predicted information from a corresponding analytical finite element model. In this study, structural model updating may be defined as the fit of an existing analytical model in the light of measu...

متن کامل

Application of Genetic Algorithms for Pixel Selection in MIA-QSAR Studies on Anti-HIV HEPT Analogues for New Design Derivatives

Quantitative structure-activity relationship (QSAR) analysis has been carried out with a series of 107 anti-HIV HEPT compounds with antiviral activity, which was performed by chemometrics methods. Bi-dimensional images were used to calculate some pixels and multivariate image analysis was applied to QSAR modelling of the anti-HIV potential of HEPT analogues by means of multivariate calibration,...

متن کامل

A Novel Scheme for Improving Accuracy of KNN Classification Algorithm Based on the New Weighting Technique and Stepwise Feature Selection

K nearest neighbor algorithm is one of the most frequently used techniques in data mining for its integrity and performance. Though the KNN algorithm is highly effective in many cases, it has some essential deficiencies, which affects the classification accuracy of the algorithm. First, the effectiveness of the algorithm is affected by redundant and irrelevant features. Furthermore, this algori...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره 2  شماره 

صفحات  -

تاریخ انتشار 2010